Exploring the Potential Role of Upper Abdominal Peritonectomy in Advanced Ovarian Cancer Cytoreductive Surgery Using Explainable Artificial Intelligence

Simple Summary The Surgical Complexity Score (SCS) has been widely used to reflect the surgical effort during advanced stage epithelial ovarian cancer (EOC) cytoreduction. However, not all surgical procedures are described by this score. Using artificial intelligence, we developed and explained an algorithm that weighted the importance of all surgical procedures for the prediction of complete cytoreduction (CC0). We identified upper abdominal peritonectomy (UAP) as the most salient procedural predictor of CC0, followed by pelvic and para-aortic lymph node dissection and ileocecal resection/right hemicolectomy. The UAP was predictive of poorer progression-free survival but not overall survival. The SCS did not impact survival. We advocate thorough early inspection of the upper abdominal quadrants to ensure that CC0 is achievable. Abstract The Surgical Complexity Score (SCS) has been widely used to describe the surgical effort during advanced stage epithelial ovarian cancer (EOC) cytoreduction. Referring to a variety of multi-visceral resections, it best combines the numbers with the complexity of the sub-procedures. Nevertheless, not all potential surgical procedures are described by this score. Lately, the European Society for Gynaecological Oncology (ESGO) has established standard outcome quality indicators pertinent to achieving complete cytoreduction (CC0). There is a need to define what weight all these surgical sub-procedures comprising CC0 would be given. Prospectively collected data from 560 surgically cytoreduced advanced stage EOC patients were analysed at a UK tertiary referral centre.We adapted the structured ESGO ovarian cancer report template. We employed the eXtreme Gradient Boosting (XGBoost) algorithm to model a long list of surgical sub-procedures. We applied the Shapley Additive explanations (SHAP) framework to provide global (cohort) explainability. We used Cox regression for survival analysis and constructed Kaplan-Meier curves. The XGBoost model predicted CC0 with an acceptable accuracy (area under curve [AUC] = 0.70; 95% confidence interval [CI] = 0.63–0.76). Visual quantification of the feature importance for the prediction of CC0 identified upper abdominal peritonectomy (UAP) as the most important feature, followed by regional lymphadenectomies. The UAP best correlated with bladder peritonectomy and diaphragmatic stripping (Pearson’s correlations > 0.5). Clear inflection points were shown by pelvic and para-aortic lymph node dissection and ileocecal resection/right hemicolectomy, which increased the probability for CC0. When UAP was solely added to a composite model comprising of engineered features, it substantially enhanced its predictive value (AUC = 0.80, CI = 0.75–0.84). The UAP was predictive of poorer progression-free survival (HR = 1.76, CI 1.14–2.70, P: 0.01) but not overall survival (HR = 1.06, CI 0.56–1.99, P: 0.86). The SCS did not have significant survival impact. Machine Learning allows for operational feature selection by weighting the relative importance of those surgical sub-procedures that appear to be more predictive of CC0. Our study identifies UAP as the most important procedural predictor of CC0 in surgically cytoreduced advanced-stage EOC women. The classification model presented here can potentially be trained with a larger number of samples to generate a robust digital surgical reference in high output tertiary centres. The upper abdominal quadrants should be thoroughly inspected to ensure that CC0 is achievable.


Introduction
In the western world, epithelial ovarian cancer (EOC) is the fifth most common cause of women's cancer-related death [1].Most women are diagnosed at an advanced stage mainly due to the lack of sufficient diagnostic tools (stage III or IV).The current gold standard treatment is cytoreductive surgery combined with carboplatin and paclitaxel chemotherapy and subsequent maintenance therapy [2,3].Such complex treatment algorithms often require extensive surgical procedures including peritoneal stripping, diaphragmatic, splenic, liver, and gastrointestinal resections [4,5].Complete cytoreduction (CC0) and chemotherapy response appear to be the most critical prognostic factors [6].
Achieving CC0 frequently requires targeted maximal effort.Previous attempts to describe the extent of cytoreductive surgery led to the development of the surgical complexity score (SCS), which best combined the numbers with the complexity of the procedures [6].Nevertheless, not all potential surgical procedures are described by this score.Lately, the European Society for Gynecologic Oncology (ESGO) has established ten quality indicators (QIs), based on the standards of practice to audit and improve advanced EOC surgery [7].Three of these QIs were outcome indicators related to achievement of CC0.In the complex environment of the operating room, CC0 is not always realized.Inconsistency among surgeons in the interpretation of the size of residual disease has been reported, prompting accurate documentation of operative findings and outcomes in the surgical notes [8].The QI8, a process indicator was related to prospective recorded information from an exhaustive list of structured surgical procedures as "minimum required elements in operative reports" [9].There is a need to define what weight all these surgical procedures comprising CC0 would be given.Therefore, most surgeons should regularly seek objective but personalised strategies to evaluate their cytoreductive outcomes.
In the era of precision oncology, Artificial Intelligence (AI) could potentially support clinicians in making meaningful predictions of the surgical outcomes for quality improvement and delivery of modern ovarian cancer care [10].We previously employed such innovative solutions to predict outcomes of cytoreductive surgery in advanced EOC [11,12].Herein, we developed an AI algorithm to support the weighted importance of all surgical procedures performed at EOC cytoreductive surgery for CC0 forecasting.Using eXplainable Artificial Intelligence (XAI), we examined and interpreted the most salient procedural interactions to explain the overall model predictive performance.

Materials and Methods
The study was a single-center retrospective cohort study including patients treated at our ESGO accredited center of excellence for advance ovarian cancer surgery between 2014-2019.All consecutive incoming women with newly diagnosed advanced stage EOC who underwent surgery during their primary therapy were included in the study.Exclusion criteria included women < 18 years at first diagnosis, women with relapsed EOC or receiving palliative surgery, women with non-epithelial tumours, and those presenting at first diagnosis with early stage EOC.The patient cohort, the MDT consensus and the hospital setting have been previously described in detail [12,13].All operations were carried out via a midline laparotomy by a team of gynaecological and, when necessary, hepatobiliary, or colorectal surgeons with an attempt to achieve total macroscopic clearance.Early intra-operative assessment of tumour dissemination was routinely performed and retrospectively documented in the operative notes prior to textual data entry in the ovarian cancer database.Ethics board approval was obtained through the Leeds Teaching Hospitals Trust (MO20/133163/18.06.20).The study was added to the UMIN/CTR Trial Registry (UMIN000049480).
The operative report was a frank adaptation of the structured ESGO ovarian cancer operative report template that included an exhaustive list of pelvic, lower abdomen and upper abdomen surgical procedures [8].All the regions of the abdominal and pelvic cavity (ovaries, tubes, uterus, pelvic peritoneum, paracolic gutters, anterior parietal peritoneum, mesentery, peritoneal surface of the colon and bowel, liver, spleen, greater and lesser omentum, hepatic port hepatic, stomach, Morrison's pouch, lesser sac, surface of both hemi diaphragms, pelvic and para-aortic lymph nodes, and if applicable pleural cavity) was evaluated and described [13].During the study years, systematic pelvic and paraaortic lymph node dissection or sampling was routinely performed, particularly in the presence of bulky lymph nodes.When applicable, the size and location of residual disease at the end of the operation, and the reasons for not achieving complete cytoreduction were reported.An ESGO-approved template was available on the ESGO website (https: //guidelines.esgo.org/,accessed on 23 April 2023).
Two separate analyses were performed.Firstly, all cases were analysed to audit the trends of surgical procedures performed overtime in both the primary and interval debulking setting.Secondly, the most important predictive feature was interrogated against commonly used engineered features including the peritoneal carcinomatosis index (PCI) and the intra-operative mapping for ovarian cancer (IMO) score, in addition to the SCS.The PCI and IMO scores were calculated at the beginning of surgery to describe the intraoperative location of the disease [14,15].We did not perform a propensity score matching, as recent evidence suggests the performance of these procedures does not significantly change in the interval cytoreductive surgery group [16].
Descriptive statistics were used to summarize the clinical characteristics of patients and their respectful cytoreductions.Continuous variables were summarized with means, standard deviations, medians, and ranges.The Kruskall-Wallis test was used to compare groups with respect to median values.Categorical variables were summarized with counts and percent.The Fisher's exact test was used to compare groups with respect to categorical variables.Progression-free survival (PFS) was defined as the time (months) from the date of initial diagnosis to the date of progression or recurrence.Patients who were alive without progression or recurrence were censored on the date of last clinical assessment.Overall survival (OS) was defined as the time (months) from the date of initial diagnosis to the date of death.Patients who were alive were censored on the date of last follow-up.We used the Kaplan and Meier (K-M) method to estimate median PFS and OS stratified by various potential prognostic factors and the log-rank test to detect associations between variables and outcomes.Multivariate analysis using the Cox proportional hazards method was performed to identify potential independent risk factors for recurrence and mortality.Pearson's correlation (r 2 ) was used to describe the associations amongst numerical variables and heatmaps were produced to illustrate the correlations.All tests were two-sided, and significance was determined at the 0.05 level.

Model Development
The eXtreme Gradient Boosting (XGBoost) algorithm was employed to model the features [17].This combines all the generated hypotheses of weak learning algorithms into a single hypothesis to boost performance.The combined effect of eight parameters to maximize model performance was investigated by evaluating a grid of combinations of values using Scikit-learn's GridSearchCV function.
The dataset was split into training and test cohorts (70%:30% ratio).A five-fold stratified cross-validation (CV) was performed and stratified folds were constructed to overcome data imbalance.The CV was iterated to decrease both variance and bias.Model performance was assessed by measuring the total area under the receiver-operating curve (AUC).Receiver operating characteristic (ROC) and Precision-Recall curves and state-of-art scores were used for performance metrics.
To explain the predictive model, the artificial intelligence SHapley Additive exPlanations (SHAP) framework was employed.The methodology enhances interpretability by computing the importance values for each feature on individual predictions; in other words, it explains how much the presence of a feature contributes to the model's overall prediction [18].The framework interprets the model of the entire cohort as a linear function of features.In this way, it explains how much the presence of a feature contributes to the model's overall predictions.Visual quantification of the model prediction was demonstrated by producing (a) SHAP summary plots for the global (cohort) explanation of the results; (b) SHAP dependence plots of the critical risk features pertinent to the prediction.The Python language Programming Software available at http://www.python.org,accessed on 12 July 2023 was used for the analyses.

Model Comparison
When UAP was asked to predict solely CC0, The ROC curve showed that UAP could effectively distinguish cytoreductive outcome (AUC = 0.78, CI: 0.76-0.81).When UAP only was incorporated in a composite model comprising of engineered features, it substantially enhanced its predictive value (AUC = 0.80, CI: 0.76-0.84)(Figure 6).

Figure 7.
Cohort survival outcomes analyzed according to the occurrence of UAP (blue = UAP cohort; orange=non-UAP cohort) (A) progression-free-survival (B) overall-survival.Note the shape difference between the concave (UAP group) and the sinusoidal (non-UAP group) curves.Hazard ratio (HR) and 95% confidence interval (CI) for prospective log-linear associations (Cox regression) between (C) recurrence and non-recurrence (D) fatal and non-fatal outcomes including the UAP and commonly used engineered features.The shape of the curves rather than the hazard ratio can be used to quantify the benefit from the intervention.In contrast, a relatively small hazard ratio (concave curves) can yield large intervention effects reflected by longer median survival times for 50% of patients.

Discussion
Surgeons are significantly challenged by EOC heterogeneity.There is an increasing need for tools to better tailor treatment strategies by improving the predictions of the surgical outcomes.By scrutinizing a validated but exhaustive list of surgical sub-procedures outside the "box standard" surgery for ovarian cancer, we aligned with the recently published NICE guidelines on maximal cytoreductive surgery [19] and successfully quantified the complexity of surgery, as highlighted in our proposed classification algorithm (Figure 8).By categorising critical procedures, we highlighted the potential key role of upper abdominal peritonectomy (UAP), a complex and technically demanding surgical procedure.Using a large dataset of women with advanced EOC who underwent cytoreductive surgery, we developed and validated an ML algorithm, which demonstrates satisfactory predictive performance but more importantly, identifies UAP as the most important procedural indicator of CC0 in surgically cytoreduced EOC women.In contrast to the Aletti SCS, which supported an arbitrary allocation of a higher score for complex procedures [6], our devised ML model supported the feature selection and weighted importance of all surgical subprocedures irrespective of the individual practice.Nevertheless, if solely used, it did not yield any survival benefit.We found that UAP best correlated with bladder peritonectomy and diaphragmatic stripping.That said, in selected patients, the procedrure should be offered not in isolation but as part of a "surgical package".The result is not surprising.The right upper quadrant is mostly affected by cancer metastases.Therefore, dissection of upper abdominal disease is critical at advanced EOC cytoreductive surgery.Fundamental anatomical knowledge and great expertise are re-quired to appreciate the critical vascular landmarks prior to dissection [20].Disease > 1 cm involving the upper abdomen above the greated omentum has been found in a recent study [21].A comprehensive approach to surgical cytoreduction should incorporate upper abdominal resection [22].We acknowledge that adequate exposure is critical to allow for complete resection.In our centre, initiation of the paradigm shift towards more complex multi-visceral surgery in the years 2016 and 2017, allows for a more thorough early intra-operative examination by mobilizing the liver and other organs and exposing the pouch of Morrison [12].Diaphragmatic involvement is estimated in up to 40% of these cases [23].Sugarbaker originally described various peritonectomy procedures, often warranted for maximal syrgical cytoreduction [24].He best defined UAP as the resection of parietal and visceral peritoneum in the upper abdomen.On the right upper quadrant, that would involve stripping from the right subhepatic space and from the surface of the liver, in addition to the right hemidiaphragm; on the left upper quadrant, stripping over the left adrenal gland and pre-renal fat, lesser omentectomy in addition to the left diaphragm and spleen.Subphrenic peritonectomies on both sides allow for visualisation of the pancreas.Of those, lesser omentectomy with stripping of the omental bursa appears to be the most difficult due to the occurrence of vital structures.Radical peritonectomies with en-bloc resection of extensive widespread diaphragmatic peritoneal carcinomatosis have also been described [25].Herein, we considered diaphragmatectomies as separate sub-procedures.Centralised surgical care is the best strategy to optimise oncologic outcoms with acceptable morbidity, even for those patients with high disease burden [26].
Overall, the study indicates that certain surgical procedures -and not the overall surgical load-are predictive of the likelihood for CC0.In addition to UAP, the top feature ranking was complemented by regional pelvic and para-aortic lymphadenectomies.Historically, nodal dissectios were associated with long-term survival [27].Between 2014 and 2019, the results of the LION trial were not available [28].Therefore, during surgical cytoreduction either on the upfront or delayed setting, the bilateral pelvic and para-aortic regions were systematically assessed, and consequently, systematic lymphadenectomy was rather routinely performed.Following publication of the LION trial results, routine lymphadenectomy is not warranted, as it does not confer a survival benefit unless there is evidence of macroscopically or radiologically enlarged lymph nodes.Disease distribution in the omental bursa, pancreatic surface, caudate lobe and portal trial are not absolute contraindications for debulking, unless there is deep infiltration of the porta hepatis or the celiac trunk [29].
The established benefit of upper abdominal cytoreduction in advanced EOC has been demonstrated even for optimal cytoreduction [30,31].In our study, we failed to confer a survival benefit from the sole performance of UAP.At first glance, this looks odd.We explained why UAP should be offered as part of a "surgical package" to selected patients.It appears that any transient benefit is potentially outplayed by a high disease load in that cohort of patients.When discussing the potential benefits from UAP, the focus should drift from the hazard ratio to the shape of probability distribution, which is disease related.Although it can be helpful for the purposes of statistical hypothesis testing the benefit from the procedure, other measures such as median times to the study endpoint are important, particularly useful when the event of interest i.e., OS may eventually occur across the entire cohort.Then the risk for death is no longer an issue [32].In our study, although UAP increased the hazard rate for PFS, the treatment effect was larger because >50% of the patients did not have a relapse at the time.Our CC0 are not inferior to those of other well established high-volume centers [31].

Strengths and Limitations
The study supported the current paradigm shift for organised centralisation of services moving away from the traditional patterns of cytoreductive surgery.Strength of the study was the study design that allowed to weight the importance of the individual procedures as outcome indicators.The cohort has been extensively scrutinised [12,13].
We applied XAI frameworks to explain the modelling "black box", but also quantitative results not essentially included under the XAI umbrella, such as Cox regression [33].We did not assess the morbidity of the surgical procedures, but it is assumed to vary as others have demonstrated the wide range in complications rates [34].We are cautious about the generic application of our results in EOC modern care.Surgical experience and institutional capacity in the management of these patients may influence outcome rates and complications related to the incorporation of upper abdominal surgery [35].Indeed, within our own practice, we observed variations in the surgical effort extended at complete resection.Nevertheless, the study was designed in such way not to reflect individual practice.Data from pre-operative imaging were not included in the study because the miliary or plaque-like morphology of the peritoneal disease makes it often undetectable by imaging [36].Disease in the upper abdomen does not come without involvement of the lower regions [37].To achieve complete clearance, we stress out the need for thorough exploration and visual inspection of the upper abdominal cavity early at surgery to resect all disease sites.Finally, if a robust surgical reference is to be generated in high output tertiary centres, a larger number of samples will be required.

Conclusions
We employed and explained an ML methodology for predicting the key surgical interventions required to achieve CC0.We identifed UAP as the most salient procedural indicator of CC0 in surgically cytoreduced EOC women.The upper abdominal quadrants should be thoroughly inspected to ensure that CC0 is achievable.

Figure 1 .
Figure 1.(A) Receiver Operator Characteristic (ROC) curve showing the diagnostic accuracy of all the surgical sub-procedures for the prediction of complete cytoreduction (AUC = 0.63) (B) Precision Recall curve and Average Precision performance value (AP = 0.44).The feature importance based on SHAP values is shown in Figure2.The order of features reflects their weighted importance across the entire cohort (global explainability).The position on the y-axis is determined by the feature and on the x-axis by the Shapley value.The colour represents the value of the feature from low (blue = CC0 or yes) to high (red = not CC0 or no).The top-3 features included para-aortic lymph node dissection, UAP and pelvic lymph node dissection.Their longer tails compared to other features demonstrate their importance for specific in not all patients (local explainability).

Figure 2 .
Figure 2. Model classification differences explained by the SHAP values.(A) Summary plot showing feature distribution plots based on the sum of SHAP value magnitudes over all samples.The color represents the feature value (Red not CC0 or no, Blue CC0 or yes resection) and the x-axis represents the impact score according to binary output (B) Standard bar plot of the mean absolute SHAP values for each feature showing the average impact on the global model output.SHAP, Shapley Additive explanations; CC, Complete Cytoreduction.

Figure 3 .
Figure 3. (A) Feature importance plot showing the relevance of each variable to the CC0 prediction when screened using random forest.(B) Correlation heatmap demonstrating the pairwise correlations amongst the surgical procedures.The Pearson correlation (r 2 ) was used.CC; complete cytoreduction.

Figure 5 .
Figure 5. Dependence plots demonstrating clear inflection points for various regional lymph node dissections.

Figure 6 .
Figure 6.Performance metrics of devised models for the prediction of complete cytoreduction.(A) UAP.(B) Composite model comprised of UAP and commonly used engineered features.

Figure 8 .
Figure 8. Study flowchart.The probability to achieve complete cytoreduction (CC0) can be well quantified by a ML-driven model inclusive all surgical sub-procedures.Upper abdominal peritonectomy is the most important predictive feature.A "surgical package" of maximal effort targeted cytoreduction including upper abdominal peritonectomy should be offered in selected patients.Thorough inspection of upper abdominal quadrants to ensure that CC0 is achievable reflects good clinical practice.ML: Machine Learning.

Table 1 .
Descriptive statistics of the performed surgical sub-procedures.